Search CORE

43 research outputs found

Semi-supervised model-based clustering with controlled clusters leakage

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Expansivity and cone-fields in metric spaces

Author: Struski Łukasz
Tabor Jacek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Jagiellonian Univeristy Repository

Set Aggregation Network as a Trainable Pooling Layer

Author: Maziarka Łukasz
Nowak Aleksandra
Spurek Przemysław
Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer.Comment: ICONIP 201

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Estimating conditional density of missing values using deep Gaussian mixture model

Author: Przewięźlikowski Marcin
Struski Łukasz
Śmieja Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We consider the problem of estimating the conditional probability distribution of missing values given the observed ones. We propose an approach, which combines the flexibility of deep neural networks with the simplicity of Gaussian mixture models (GMMs). Given an incomplete data point, our neural network returns the parameters of Gaussian distribution (in the form of Factor Analyzers model) representing the corresponding conditional density. We experimentally verify that our model provides better log-likelihood than conditional GMM trained in a typical way. Moreover, imputation obtained by replacing missing values using the mean vector of our model looks visually plausible.Comment: A preliminary version of this paper appeared as an extended abstract at the ICML 2020 Workshop on The Art of Learning with Missing Value

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Numerical application of generalized cone fields

Author: Struski Łukasz
Publication venue
Publication date: 16/01/2014
Field of study

Jagiellonian Univeristy Repository

Pointed subspace approach to incomplete data

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Incomplete data are often represented as vectors with filled missing attributes joined with flag vectors indicating missing components. In this paper, we generalize this approach and represent incomplete data as pointed affine subspaces. This allows to perform various affine transformations of data, such as whitening or dimensionality reduction. Moreover, this representation preserves the information, which coordinates were missing. To use our representation in practical classification tasks, we embed such generalized missing data into a vector space and define the scalar product of embedding space. Our representation is easy to implement, and can be used together with typical kernel methods. Performed experiments show that the application of SVM classifier on the proposed subspace approach obtains highly accurate results

Jagiellonian Univeristy Repository

Subspace memory clustering

Author: Spurek Przemysław
Struski Łukasz
Tabor Jacek
Publication venue: 'Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego'
Publication date: 01/01/2015
Field of study

We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D c RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments

Portal Czasopism Naukowych (E-Journals)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Jagiellonian Univeristy Repository